A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification

نویسندگان

  • Mohamed K. Elhadad
  • Khaled M. Badran
  • Gouda I. Salama
چکیده

Thetaskofextractingtheusedfeaturevectorinminingtasks(classification,clustering...etc.)is consideredthemostimportanttaskforenhancingthetextprocessingcapabilities.Thispaperproposes anovelapproachtobeusedinbuildingthefeaturevectorusedinwebtextdocumentclassification process;addingsemanticsinthegeneratedfeaturevector.Thisapproachisbasedonutilizingthe benefitofthehierarchalstructureoftheWordNetontology,toeliminatemeaninglesswordsfromthe generatedfeaturevectorthathasnosemanticrelationwithanyofWordNetlexicalcategories;this leadstothereductionofthefeaturevectorsizewithoutlosinginformationonthetext,alsoenriching the featurevectorbyconcatenatingeachwordwith its correspondingWordNet lexical category. Forminingtasks,theVectorSpaceModel(VSM)isusedtorepresenttextdocumentsandtheTerm FrequencyInverseDocumentFrequency(TFIDF)isusedasatermweightingtechnique.Theproposed ontologybasedapproachwasevaluatedagainstthePrincipalcomponentanalysis(PCA)approach, andagainstanontologybasedreductiontechniquewithouttheprocessofaddingsemanticstothe generatedfeaturevectorusingseveralexperimentswithfivedifferentclassifiers(SVM,JRIP,J48, Naive-Bayes,andkNN).Theexperimentalresultsrevealtheeffectivenessoftheauthors’proposed approachagainstothertraditionalapproachestoachieveabetterclassificationaccuracyF-measure, precision,andrecall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents

This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJSI

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2018